Dependency

Part 1 Top 50 Cities with Accumulated Most H1B Filings (YR 2018 Data)

Data Collection

Initial Data Cleaning

Column Name

Enough Samples?

Data Types

Missingness

Not a lot. We can drop them.

Clean job_title

Questions

Companies (Top 100)

In the original data set, there are Google INC and Google LLC. After the reform, Google INC should be Google LLC.

Cities

Job Titles

2. What’s the distribution of H-1B salary for those certified/ withdrawn/ denied LCAs in 2018? What’s the certified rate?

Case distribution

Certified rate

3.Distribution of length of preparation period (submit – start)

4.What positions have the highest median salaries among certified LCAs

Not specifying applicant numebr

Applicants > 10

5. Explore the trend in Amazon over last 10 years. What's trend for median salary compared with number of applicants? What's the base salary distribution by each year?

Get data

Count and median base salary

Salary distribution by year

6. Examine the factors that are predictive of an applicant obtaining LCA certification?

Data Processing

top100

Top companies

job20

job_tech

case_status

Machine Learning(Classification Task)

Explore the train set

Resplit

Performance(training set)

Let's look at performance on training set.

Performance(test set)

Let's look at performance on test set.

For the most predictive model, do the following:

Permutation

PDP Plot

ICE Plot

Global Surrogate Models